Skip to content

[runtime] Add page cache metrics#3544

Closed
roberto-bayardo wants to merge 49 commits into
mainfrom
page-fault-metric
Closed

[runtime] Add page cache metrics#3544
roberto-bayardo wants to merge 49 commits into
mainfrom
page-fault-metric

Conversation

@roberto-bayardo
Copy link
Copy Markdown
Collaborator

@roberto-bayardo roberto-bayardo commented Apr 7, 2026

  • Add page_faults metric to CacheRef (page cache), incremented on every cache miss that enters the fault handler
  • Add page_evictions metric to CacheRef that is incremented when an old page has to be evicted to make room for a new one.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 7, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp be6cc74 Apr 22 2026, 05:36 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 7, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: be6cc74
Status: ✅  Deploy successful!
Preview URL: https://8b6b8823.monorepo-eu0.pages.dev
Branch Preview URL: https://page-fault-metric.monorepo-eu0.pages.dev

View logs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds observability for the paged page-cache by introducing a page_faults Prometheus counter and wiring metric registration through the runtime Metrics interface (via BufferPooler).

Changes:

  • Add a page_faults counter to CacheRef, incremented when a cache miss enters the page-fault path.
  • Change CacheRef::new to accept a Metrics context and register the new counter.
  • Make BufferPooler extend Metrics so callers using from_pooler don’t need to thread a separate metrics parameter.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
runtime/src/utils/buffer/paged/cache.rs Registers and increments a new page_faults counter in the page-cache fault handler; updates constructor API and tests.
runtime/src/utils/buffer/paged/append.rs Updates a test call site to the new CacheRef::new(&impl Metrics, ...) signature.
runtime/src/lib.rs Updates BufferPooler to be a supertrait of Metrics to support metric registration from pooler-based construction.

Comment thread runtime/src/utils/buffer/paged/cache.rs Outdated
Comment thread runtime/src/lib.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Page fault counter incremented before confirming actual fault
    • Moved page_faults.inc() to after the write-lock cache recheck so the counter only increments for confirmed page faults, not when another task cached the page concurrently.

Create PR

Or push these changes by commenting:

@cursor push f4af0d3b46
Preview (f4af0d3b46)
diff --git a/runtime/src/utils/buffer/paged/cache.rs b/runtime/src/utils/buffer/paged/cache.rs
--- a/runtime/src/utils/buffer/paged/cache.rs
+++ b/runtime/src/utils/buffer/paged/cache.rs
@@ -309,8 +309,6 @@
 
         let (page_num, offset_in_page) = Cache::offset_to_page(self.page_size, offset);
         let offset_in_page = offset_in_page as usize;
-        self.page_faults.inc();
-        trace!(page_num, blob_id, "page fault");
 
         // Create or clone a future that retrieves the desired page from the underlying blob. This
         // requires a write lock on the page cache since we may need to modify `page_fetches` if
@@ -325,6 +323,10 @@
                 return Ok(count);
             }
 
+            // Only count as a page fault after confirming the page is not in cache.
+            self.page_faults.inc();
+            trace!(page_num, blob_id, "page fault");
+
             let key = (blob_id, page_num);
             match cache.page_fetches.entry(key) {
                 Entry::Occupied(o) => {

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread runtime/src/utils/buffer/paged/cache.rs
Comment thread runtime/src/utils/buffer/paged/cache.rs Outdated
@roberto-bayardo roberto-bayardo force-pushed the page-fault-metric branch 9 times, most recently from 0ff6666 to 2c1f46f Compare April 8, 2026 01:19
@roberto-bayardo roberto-bayardo moved this to Ready for Review in Tracker Apr 8, 2026
@roberto-bayardo roberto-bayardo requested a review from Copilot April 8, 2026 16:34
Comment thread storage/src/merkle/mmr/journaled.rs Outdated
let init_cache =
CacheRef::from_pooler(context.with_label("init"), PAGE_SIZE, PAGE_CACHE_SIZE);
let mut mmr = Mmr::<_, Digest>::init(
context.with_label("init"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be same name?

Comment thread storage/src/merkle/mmr/journaled.rs Outdated
// init_sync with range starting beyond the existing data triggers the
// "fresh start" path (clear_to_size).
let sync_cache =
CacheRef::from_pooler(context.with_label("sync"), PAGE_SIZE, PAGE_CACHE_SIZE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same name as MMR below

Comment thread storage/src/ordinal/storage.rs Outdated
let mut replay_blob =
ReadBuffer::from_pooler(&context, blob.clone(), *size, config.replay_buffer);
let mut replay_blob = ReadBuffer::from_pooler(
context.with_label("replay"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could lead to conflicting context?

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Double "network" label in setup_network_with_participants calls
    • Changed all 16 call sites from context.with_label("network") to context.clone() since setup_network_with_participants already applies with_label("network") internally.

Create PR

Or push these changes by commenting:

@cursor push 074a42aec8
Preview (074a42aec8)
diff --git a/consensus/src/marshal/mocks/harness.rs b/consensus/src/marshal/mocks/harness.rs
--- a/consensus/src/marshal/mocks/harness.rs
+++ b/consensus/src/marshal/mocks/harness.rs
@@ -1994,7 +1994,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2178,7 +2178,7 @@
         let attacker = participants[1].clone();
         let peers = vec![victim.clone(), attacker.clone()];
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             peers.clone(),
         )
@@ -2297,7 +2297,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2374,7 +2374,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2470,7 +2470,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2557,7 +2557,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2767,7 +2767,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2854,7 +2854,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -2932,7 +2932,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3027,7 +3027,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3092,7 +3092,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3180,7 +3180,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(3),
             participants.clone(),
         )
@@ -3293,7 +3293,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3365,7 +3365,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3498,7 +3498,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )
@@ -3587,7 +3587,7 @@
             ..
         } = bls12381_threshold_vrf::fixture::<V, _>(&mut context, NAMESPACE, NUM_VALIDATORS);
         let mut oracle = setup_network_with_participants(
-            context.with_label("network"),
+            context.clone(),
             NZUsize!(1),
             participants.clone(),
         )

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread consensus/src/marshal/mocks/harness.rs
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ed497e8. Configure here.

Comment thread consensus/src/marshal/coding/marshaled.rs
};
let (engine, engine_mailbox) =
Engine::<_, PublicKey, TestMessage, _>::new(context.clone(), config);
Engine::<_, PublicKey, TestMessage, _>::new(context.child("engine"), config);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just take context from above as-is?

};
let (engine, mailbox) =
Engine::<_, PublicKey, TestMessage, _>::new(engine_context.clone(), config);
Engine::<_, PublicKey, TestMessage, _>::new(engine_context.child("engine"), config);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just take context as-is?

mailboxes.insert(peer_b.clone(), mailbox_b);
for (peer, network) in registrations {
let ctx = context.with_label(&format!("peer_{}", peer));
let ctx = context.child("peer").with_attribute("peer", &peer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .child(engine)?

# Conflicts:
#	consensus/src/marshal/coding/marshaled.rs
#	consensus/src/marshal/standard/deferred.rs
#	consensus/src/marshal/standard/inline.rs
#	storage/src/qmdb/benches/merkleize.rs
# Conflicts:
#	runtime/src/utils/mod.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 97.97948% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.92%. Comparing base (459842f) to head (83ba05b).

Files with missing lines Patch % Lines
consensus/src/marshal/standard/inline.rs 80.35% 11 Missing ⚠️
p2p/src/simulated/network.rs 88.23% 5 Missing and 3 partials ⚠️
resolver/src/p2p/fetcher.rs 75.00% 6 Missing ⚠️
p2p/src/authenticated/discovery/channels.rs 0.00% 5 Missing ⚠️
p2p/src/authenticated/lookup/channels.rs 0.00% 5 Missing ⚠️
runtime/src/lib.rs 98.41% 3 Missing and 1 partial ⚠️
consensus/src/simplex/actors/resolver/actor.rs 62.50% 0 Missing and 3 partials ⚠️
consensus/src/simplex/mocks/reporter.rs 92.50% 0 Missing and 3 partials ⚠️
...p/src/authenticated/discovery/actors/peer/actor.rs 96.15% 3 Missing ⚠️
runtime/src/tokio/runtime.rs 96.38% 3 Missing ⚠️
... and 9 more
@@            Coverage Diff             @@
##             main    #3544      +/-   ##
==========================================
+ Coverage   95.87%   95.92%   +0.05%     
==========================================
  Files         440      440              
  Lines      172054   174371    +2317     
  Branches     4001     3995       -6     
==========================================
+ Hits       164958   167270    +2312     
- Misses       5827     5831       +4     
- Partials     1269     1270       +1     
Files with missing lines Coverage Δ
broadcast/src/buffered/engine.rs 95.30% <100.00%> (ø)
broadcast/src/buffered/metrics.rs 100.00% <100.00%> (ø)
broadcast/src/buffered/mod.rs 98.72% <100.00%> (+0.01%) ⬆️
collector/src/p2p/engine.rs 86.29% <100.00%> (-0.33%) ⬇️
collector/src/p2p/mod.rs 98.93% <100.00%> (+<0.01%) ⬆️
consensus/src/aggregation/metrics.rs 100.00% <100.00%> (ø)
consensus/src/aggregation/mod.rs 98.15% <100.00%> (-0.03%) ⬇️
consensus/src/marshal/coding/mod.rs 98.54% <100.00%> (+0.03%) ⬆️
consensus/src/marshal/coding/shards/engine.rs 97.07% <100.00%> (+<0.01%) ⬆️
consensus/src/marshal/coding/shards/metrics.rs 100.00% <100.00%> (ø)
... and 135 more

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 459842f...83ba05b. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants